Exploring Multidimensional Continuous Feature Space to Extract Relevant Words
نویسندگان
چکیده
With growing amounts of text data the descriptive metadata become more crucial in efficient processing of it. One kind of such metadata are keywords, which we can encounter e.g. in everyday browsing of webpages. Such metadata can be of benefit in various scenarios, such as web search or contentbased recommendation. We research keyword extraction problem from the perspective of vector space and present a novel method to extract relevant words from an article, where we represent each word and phrase of the article as a vector of its latent features. We evaluate our method within text categorisation problem using a well-known 20-newsgroups dataset and achieve state-of-the-art results.
منابع مشابه
A Linked Feature Space Approach to Exploring LIDAR Data
A typical approach to exploring Light Detection and Ranging (LIDAR) datasets is to extract features using pre-defined segmentation algorithms. However, this approach only provides a limited set of features that users can investigate. To expand and represent the rich information inside the LIDAR data, we introduce a linked feature space concept that allows users to make regular, conjunctive, and...
متن کاملMind and Artifact: A Multidimensional Matrix for Exploring Cognition-Artifact Relations
What are the possible varieties of cognitionartifact relations, and which dimensions are relevant for exploring these varieties? This question is answered in two steps. First, three levels of functional and informational integration between human agent and cognitive artifact are distinguished. These levels are based on the degree of interactivity and direction of information flow, and range fro...
متن کاملDiscovering Hidden Interests from Twitter for Multidimensional Analysis
With the popularity of social networks, Twitter has become one of the dominant providers of massive quantities of information. Exploring the distributions and correlations from Twitter data helps accurate personalized recommendations. Online Analytical Processing, or OLAP, provides an intuitive form that is suitable for exploring Twitter data. Unfortunately, the traditional OLAP approaches can ...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملUsing a bag of Words for Automatic Medical Image Annotation with a Latent Semantic
We present in this paper a new approach for the automatic annotation of medical images, using the approach of "bag-of-words" to represent the visual content of the medical image combined with text descriptors based approach tf.idf and reduced by latent semantic to extract the co-occurrence between terms and visual terms. A medical report is composed of a text describing a medical image. First, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014